TextRunner: Open Information Extraction on the Web
نویسندگان
چکیده
Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and extracts a large set of relational tuples, without requiring any human input. (Banko et al., 2007) TextRunner is a fullyimplemented, highly scalable example of OIE. TextRunner’s extractions are indexed, allowing a fast query mechanism. Our first public demonstration of the TextRunner system shows the results of performing OIE on a set of 117 million web pages. It demonstrates the power of TextRunner in terms of the raw number of facts it has extracted, as well as its precision using our novel assessment mechanism. And it shows the ability to automatically determine synonymous relations and objects using large sets of extractions. We have built a fast user interface for querying the results.
منابع مشابه
Open Information Extraction for the Web
1 3 , 8 1 0 , 0 0 0 T u p l e s ? P r i m a r y E n t i t i e s ? R e l a t i o n s F i l t e r i n g Figure 4.2: Open Extraction from Wikipedia: TextRunner extracts 32.5 million distinct assertions from 2.5 million Wikipedia articles. 6.1 million of these tuples represent concrete relationships between named entities. The ability to automatically detect synonymous facts about abstract entities...
متن کاملOpen Information Extraction Using Wikipedia
Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? Thi...
متن کاملSemantic Role Labeling for Open Information Extraction
Open Information Extraction is a recent paradigm for machine reading from arbitrary text. In contrast to existing techniques, which have used only shallow syntactic features, we investigate the use of semantic features (semantic roles) for the task of Open IE. We compare TEXTRUNNER (Banko et al., 2007), a state of the art open extractor, with our novel extractor SRL-IE, which is based on UIUC’s...
متن کاملFiltering Information Extraction via User-Contributed Knowledge
Large repositories of knowledge can enable more powerful AI systems. Information Extraction (IE) is one approach to building knowledge repositories by extracting knowledge from text. Open IE systems like TextRunner [Banko et al., 2007] are able to extract hundreds of millions of assertions from Web text. However, because of imperfections in extraction technology and the noisy nature of Web text...
متن کاملRelational Web Search
Facts are naturally organized in terms of entities, classes, and their relationships as in an entity-relationship diagram or a semantic network. Search engines have eschewed such structures because, in the past, their creation and processing have not been practical at Web scale. This paper introduces the extraction graph, a textual approximation to an entity-relationship graph, which is automat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007